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ABSTRACT 

A  new  endmember  extraction  method  has  been  developed  that  is  based  on  a  convex  cone  model  for  representing  vector 
data.  The  endmembers  are  selected  directly  from  the  data  set.  The  algorithm  for  finding  the  endmembers  is  sequential: 
the  convex  cone  model  starts  with  a  single  endmember  and  increases  incrementally  in  dimension.  Abundance  maps  are 
simultaneously  generated  and  updated  at  each  step.  A  new  endmember  is  identified  based  on  the  angle  it  makes  with  the 
existing  cone.  The  data  vector  making  the  maximum  angle  with  the  existing  cone  is  chosen  as  the  next  endmember  to 
add  to  enlarge  the  endmember  set.  The  algorithm  updates  the  abundances  of  previous  endmembers  and  ensures  that  the 
abundances  of  previous  and  current  endmembers  remain  positive  or  zero.  The  algorithm  terminates  when  all  of  the  data 
vectors  are  within  the  convex  cone,  to  some  tolerance.  The  method  offers  advantages  for  hyperspectral  data  sets  where 
high  correlation  among  channels  and  pixels  can  impair  un-mixing  by  standard  techniques.  The  method  can  also  be 
applied  as  a  band-selection  tool,  finding  end-images  that  are  unique  and  forming  a  convex  cone  for  modeling  the 
remaining  hyperspectral  channels.  The  method  is  described  and  applied  to  hyperspectral  data  sets. 
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1.  INTRODUCTION 

We  have  developed  a  convex  factorization  technique  that  simultaneously  generates  sets  of  endmembers  and  endmember 
abundances.  It  extends  earlier  work  on  band  selection1  and  linear  mixing  models2  The  technique  finds  extreme  vectors 
within  a  data  set  and  uses  these  extreme  vectors  as  endmembers.  An  extreme  vector  is  a  vector  that  cannot  be 
represented  by  a  positive  linear  combination  of  other  vectors  in  the  data.  Non-extreme  vectors  can  be  modeled  by  a 
positive  linear  combination  of  extreme  vectors.  The  extreme  vectors  or  endmembers  form  a  convex  cone  that  contains 
the  remaining  data  vectors.  The  convex  cone  provides  a  linear  mixing  model  for  the  data  vectors,  with  the  positive 
coefficients  being  identified  with  the  abundance  of  the  endmember  in  the  mixture  model  of  a  data  vector.  If  the  positive 
coefficients  are  further  constrained  to  sum  to  one,  the  convex  cone  reduces  to  a  convex  hull  and  the  extreme  vectors  form 
a  simplex. 

Several  endmember  extraction  procedures  (ORASIS3,  N-FINDR4  and  Iterative  Error  Analysis  (IE A)5)  have  been 
developed.  These  autonomous  algorithms  were  recently  compared6  to  each  other  and  to  the  interactive  pixel  purity  index 
method7.  ORASIS  is  a  suite  of  codes  that  finds  endmembers  from  a  scene  autonomously.  It  uses  a  Modified  Gram 
Schmidt  (MGS)  algorithm  to  factor  the  data  matrix  and  then  a  shrink-wrapping  technique  to  find  an  outer  simplex8,9  that 
encloses  the  data.  The  extreme  points  of  the  outer  simplex  need  not  be  data  points.  After  the  endmembers  have  been 
found,  a  constrained  linear  mixing  model  can  be  used  to  obtain  material  abundance  maps.  Alternatively,  to  maintain 
speed  for  real  time  processing,  ORASIS  has  the  option  of  skipping  the  shrink-wrapping,  using  the  vectors  selected  by  the 
MGS  procedure  and  a  set  of  filter  vectors10  derived  from  unconstrained  least  squares.  In  this  mode,  the  ORASIS 
procedure  finds  an  orthogonal  basis  to  fit  the  data.  N-FINDR  is  an  end-member  code  that  runs  autonomously  and  finds 
pure  pixels  that  can  be  used  to  describe  the  mixed  pixels  in  the  scene.  The  algorithm  finds  an  inner  simplex  within  the 
data  and  selects  the  largest  volume  simplex.  After  the  end-member  determination  step,  N-FINDR  uses  a  constrained 
linear  mixing  model  to  obtain  abundances.  The  Iterative  Error  Analysis  (IE A)  approach  performs  a  sequence  of 
constrained  least-squares  calculations,  starting  with  the  data  spectrum  that  is  least  well  modeled  by  the  average  and 
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selecting  additional  endmembers  from  the  poorest  modeled  data  points  at  each  step.  The  method  finds  extreme  points  in 
the  data  to  use  as  end-members.  At  each  step  a  simplex  is  formed  from  the  selected  data  points.  The  process  terminates 
when  the  number  of  endmembers  sought  are  found  or  a  selected  tolerance  on  residual  error  has  been  met.  The  selection 
of  extreme  vectors  in  our  factorization  procedure  is  very  similar  to  the  (IEA)  approach.  The  distinction  is  the  limitation 
of  IEA  to  finding  linearly  independent  endmembers  and  a  single  inner  simplex  to  model  all  of  the  data.  The  three 
methods  find  linear  independent  sets  of  spectra  as  endmembers.  The  pixel  abundances  are  obtained  after  selection  via  a 
common  linear  mixing  model  for  all  pixels.  Many  hyperspectral  data  sets  and  most  multispectral  data  sets  contain  more 
extreme  vectors  than  the  rank  of  the  data.  For  such  cases,  these  endmember  procedures  find  and  utilize  only  a  subset  of 
the  extreme  vectors  in  the  data. 

To  avoid  linear  dependencies  and  ill-conditioned  least  squares  computations,  the  use  of  linear  mixing  models  has 
generally  been  limited  to  small  numbers  of  endmembers.  Applications  have  often  made  the  simplifying  assumption  that 
each  material  in  the  scene  is  describable  by  a  single  endmember.  While  a  scene  endmember  pixel  spectrum  is  unique, 
there  is  not  a  one  to  one  correspondence  with  the  number  of  materials  and  the  number  of  endmembers.  An  endmember 
pixel  may  contain  only  one  material  or  it  may  contain  the  high  percentage  of  a  single  material  in  the  scene  together  with 
a  unique  combination  of  other  materials.  There  are  typically  more  endmembers  than  materials.  For  a  given  material, 
their  will  be  the  most  shadowed,  the  most  highly  solar  illuminated,  the  most  or  the  least  weathered,  the  most  and  least 
chlorophyll  containing  and  for  infrared  data,  the  highest  and  lowest  temperature  examples  of  the  material.  All  of  the 
environmental  and  atmospheric  variability  leads  to  a  potentially  large  set  of  endmember  spectra  for  a  single  material. 
These  issues  have  been  addressed  recently11  by  describing  the  variability  scene  materials  with  bundles  of  spectra,  and 
using  linear  programming  techniques  to  determine  the  abundances.  An  alternative  is  to  use  a  regularization  approach.2 

In  Section  2,  the  method  is  described,  and  in  Section  3  it  is  illustrated  in  an  application  to  finding  spectral  endmembers 
and  an  example  band  selection  problem. 


2.  METHOD 


The  endmembers  and  factor  matrices  are  determined  sequentially.  At  each  cycle,  a  new  convex  cone  is  formed  by 
selecting  the  vector  from  the  original  matrix  that  lies  furthest  from  the  cone  defined  by  the  existing  basis,  and  adding  it  to 
the  basis.  A  constrained  projection  of  the  newly  selected  vector  is  performed  on  remaining  data  vectors.  The  procedure  is 
fast  and  many  endmembers  can  be  found  rapidly. 

2.1  Linear  expansion  of  HSI  data 

Hyperspectral  data  can  be  organized  in  matrix  form,  by  assigning  the  spectral  channels  to  matrix  rows  and  scene  pixel 
spectra  to  columns.  In  the  image  matrix,  H,  the  element,  Hij,  is  the  radiance  in  the  ith  channel  of  the  jth  pixel.  Each  row 
of  H  is  a  channel  image  and  each  column  contains  a  pixel  spectrum.  The  data  matrix  can  then  be  represented  by  a 
column  or  row  expansion. 

The  expansion  in  a  spectral  or  column  basis,  S ,  leads  to 
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where  N  is  the  expansion  length.  The  matrix  FN  is  a  matrix  of  expansion  coefficients.  It  contains  the  contribution  of 
each  basis  spectrum  to  each  pixel.  The  matrix  RN  is  the  error  or  residual  matrix  resulting  from  truncation  of  the 
expansion  to  a  set  of  A  basis  functions. 

The  expansion  in  terms  of  a  basis  of  images  or  rows,  P ,  leads  to 
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where  L  is  the  expansion  length.  The  matrix  P  is  a  matrix  of  single-channel  images  of  H  at  channels  selected  as 
endmembers  (end-images)  and  the  matrix  CL  contains  the  expansion  coefficients,  the  contributions  of  the  L  end-images 

to  each  data  image.  The  matrix  RL  is  the  error  or  residual  matrix  resulting  from  the  truncation  of  the  expansion  to  L  basis 
images. 

There  are  many  possibilities  for  the  explicit  selection  of  spectra,  or  images,  for  a  basis.  We  restrict  choices  to  spectra  and 
images  that  are  in  the  data  set.  The  selection  algorithm  is  given  in  detail  for  the  selection  of  columns  of  pixel  spectra. 

2.2  Algorithm  for  selecting  a  basis 

We  discuss  two  processes,  one  in  which  no  restrictions  are  placed  on  the  expansion  coefficients,  and  its  generalization  to 
constrain  the  expansions  coefficients  to  be  non-negative.  In  the  former  implementation,  the  selected  vectors  form  a 
highly  linearly  independent  basis  and  define  a  subspace  model  for  the  hyperspectral  data.  In  the  latter,  the  selected 
vectors  are  extreme  vectors  in  the  data.  The  extreme  vectors  form  a  convex  cone  within  their  subspace  and  model  the 
hyperspectral  data  that  lies  within  the  cone.  The  residuals  in  the  constrained  case  are  those  components  of  the  data  set 
that  lie  outside  the  convex  cone.  The  primary  difference  between  unconstrained  and  unconstrained  algorithms  is  the 
projection  step,  which  leads  to  different  vectors  being  selected  in  cycles  following  the  occurrence  of  an  active  constraint. 
Orthogonal  projections  are  used  in  the  unconstrained  case  and  oblique  projections  are  used  in  the  constrained  case.  The 
criterion  used  to  determine  the  next  basis  vector  is  based  on  the  length  of  its  residual  in  the  current  model.  The  length  is 
the  distance  that  the  vector  lies  outside  the  subspace  defined  by  the  current  basis  for  the  unconstrained  case,  or  outside 
the  convex  cone  for  the  constrained  case. 

The  pixel  spectra  of  hyperspectral  scenes  are  large  sets  of  linearly  dependent  vectors  and  a  selection  process  is  needed  to 
find  a  set  of  linearly  independent  vectors  for  the  factorization,  or  set  of  extreme  vectors  for  the  convex  factorization.  The 
procedure  used  is  based  on  an  adaptation  of  the  augmented  modified  Gram  Schmidt  (AMGS)  method12,  a  sequential 
orthogonalization  algorithm.  There  are  two  steps  to  each  cycle  of  the  sequential  factorization  algorithm.  The  first  step 
selects  the  remaining  vector  among  the  data  vectors  that  is  least  well  approximated  by  the  currently  chosen  basis  of 
vectors,  as  the  next  vector  in  the  basis.  The  second  step  is  the  removal  of  the  projection  of  the  currently  chosen  vector 
from  all  of  the  remaining  data  vectors. 

We  illustrate  the  method  with  an  expansion  based  on  pixel  spectra,  Equation  1.  The  set  of  spectra  that  form  the 

matrix  S  are  selected  from  the  columns  {hj }  of  H  are  determined  sequentially.  The  array,  q(n ) ,  is  used  to  store  the 
column  indices  of  the  data  matrix  that  are  chosen  to  form  the  columns  of  S,  =  h^n^k  - 1,  N} .  A  set  of  auxiliary 
vectors  {wn  ,n  =  l,N}  is  used  in  the  processing. 


The  initial  vector  from  the  set  {h®}  can  be  selected  by  any  criterion  or  at  random.  We  choose  it  as  the  longest  vector, 
store  its  index  in  q(l),  and  set 
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The  selected  spectrum  is  removed  from  all  of  the  vectors  by  projection. 
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Ox  j  is  an  orthogonal  projection  coefficient.  F^j  is  a  general  projection  coefficient  and  ax  is  a  scale  factor.  For 
aX  j  - 1 ,  the  projection  is  an  orthogonal  projection.  The  vector,  hXj ,  is  orthogonal  to  wx.  For  aX  j  ^  1 ,  the  projection  is 

oblique.  For  the  oblique  case  the  vector,  hj,  is  not  orthogonal  to wx .  In  both  cases,  the  vectors  are  the  residuals  to  an 
approximation  of  H  by  the  single  basis  function.  The  subsequent  steps  involve  selection  of  a  basis  function  and  its 
removal  by  projection  from  the  columns  of//  .  A  numerically  stable  choice  for  the  nth  basis  is  any  vector,  hj~l ,  whose 
length  is  greater  that  some  threshold.  Its  length  is  its  residual  norm  of  the  vector  in  the  n  - 1  basis  function  model.  In 
our  algorithm,  we  select  the  vector  with  the  largest  length,  store  its  index  in  q(n)  and  set  wn  -  .  The  projection 

process  is  repeated  and  the  set  of  vectors  {hj }  is  formed  as 
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The  lengths  of  the  vectors  hj  are  the  residual  norms  for  the  approximation  of  the  vectors  with  the  basis  set  of  n  vectors. 
The  process  is  continued  until  after  M  basis  functions  are  found,  and  all  of  the  residual  norms  of  the  vectors  are  below  an 
acceptable  threshold.  The  set  of  vectors  {hf }  forms  the  columns  of  the  residual  matrix  RM  .  The  set  {sk,k  =  1  ,M}  is  the 
selected  basis  set  of  columns  of  H  . 


If  all  of  the  scaling  factors  are  set  to  one,  an  j  =1,  the  set  {wn ,  n  =  1,  n)  forms  an  orthogonal  basis  for  H  .  It  is  an 

augmented  modified  Gram  Schmidt  (AMGS)  orthogonal  basis.  In  our  implementation  of  AMGS  and  convex 
factorization,  we  track  the  original  vectors  throughout  and  use  the  non-orthogonal  basis  {sn } .  The  expansion 
coefficients  are  obtained  and  updated  for  the  non-orthogonal  basis  at  each  selection  and  projection  step.  We  introduce  the 
coefficient  notation,  Fkj ,  for  the  value  of  the  expansion  coefficient  of  the  kth  basis  vector  in  its  expansion  of  the 

jth  column  of  H  after  its  update  on  entry  of  the  nth  basis  function  to  the  set.  When  k  =  n  ,  the  expansion  coefficient  is 

the  projection  coefficient  given  by  Equation  9.  For  \<k<n-\  ,  Fkj  is  an  updated  expansion  coefficient  of  the 

kth  previously  selected  basis  set  member.  Updates  to  the  expansion  coefficients  on  entry  of  the  nth  vector  to  the  basis  are 
given  by 
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where  F£  ^  is  the  expansion  coefficient  in  the  approximation  of  Wn  by  the  previously  selected  kth  basis  function.  At 
termination,  after  N  cycles,  the  expansion  coefficients  of  the  selected  basis  vectors  are  updated  to 
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The  final  set  of  coefficients,  {F„j } ,  are  the  elements  of  the  expansion  matrix  FN  for  the  non-orthogonal  basis  . 

The  set  of  vectors  {hj }  forms  the  columns  of  the  residual  matrix,  R N .  At  the  end  of  each  cycle,  a  basis,  a  set  of 
expansion  coefficients  and  a  residual  matrix  are  available. 


The  unconstrained  algorithm  uses  orthogonal  projections  and  computes  an  unconstrained  least  squares  fit  of  the 
hyperspectral  data  set  in  the  basis  {sk} .  These  expansion  coefficients  are  the  least-squares  expansion  coefficients  of  the 

original  vectors  in  the  non-orthogonal  basis.  The  non-orthogonal  basis  is  a  linearly  independent  set.  Our  AMGS  method 
is  similar  to  the  ORASIS  mode  when  shrink-wrapping  is  not  applied.10 

By  using  analogous  procedures,  a  subset  of  images,  {Pl } ,  expansion  coefficients,  {C^ } ,  and  residuals  R^j  can  be  found 
to  form  the  row  expansion  of  the  hyperspectral  data  matrix,  as  indicated  in  Equation  (2). 

2.3  Convex  Factorization  (CF) 

To  obtain  endmembers  of  the  hyperspectral  data  set,  extreme  point  vectors  are  sought  as  a  basis  set  {sk}.  Extreme 
vectors  are  unique  vectors  of  H  having  the  property  that  they  cannot  be  approximated  by  a  positive  linear  combination 
of  other  vectors  belonging  to  H  .  Non-extreme  vectors  can  be  approximated  by  a  positive  linear  combination  of  the 
extreme  vectors.  The  CF  procedure  uses  the  strategy  outlined  above  to  find  extreme  vectors.  However,  the  expansion 
coefficients  are  constrained  to  be  non-negative.  The  orthogonal  projections  must  be  replaced  with  oblique  projections  to 
satisfy  active  constraints.  Convex  cone  expansions  require  that  the  expansion  coefficients  are  non-negative,  while  convex 
hull  expansions  require  in  addition  that  the  expansion  coefficients  for  the  spectrum  or  image  model  sum  to  one.  For  the 
convex  cone  factorizations,  the  expansion  coefficients  for  end-spectra  and  end-images  satisfy 

F„  j  >  0  and  Czy  >  0 ,  respectively.  (8) 


T  th  .fh 

The  coefficients,  Ctj  ,  form  the  matrix,  C  ,  of  expansion  coefficients  for  the  /  channel  image  by  the  l  end-image, 

Equation  (2).  These  coefficients  are  computed  in  a  totally  analogous  procedure  to  the  procedure  outlined  here  for  the 
expansion  of  pixel  spectra  by  end-spectra. 

For  convex  hull  factorizations,  the  expansion  coefficients  must  satisfy  inequalities  (12)  and,  in  addition  satisfy  the 
equality  constraints, 

N  L 

^  F^j  -  1  and  =  1 ,  for  all  j  pixels  and  all  (i)  images,  respectively. 

n  l 

A  convex  hull  factorization  corresponds  to  modeling  every  pixel  or  channel  image  of  the  hyperspectral  data  set  as  a 
weighted  average  of  the  basis  set.  It  is  an  interpolation.  A  convex  cone  factorization  leads  to  a  model  that  is  a  scaled 
weighted  average  or  scaled  interpolation.  An  unrestricted  expansion,  with  no  constraints  on  the  coefficients,  creates  a 
model  that  can  lie  anywhere  in  the  subspace  defined  by  the  basis. 

We  describe  here  how  the  constraints  are  maintained  in  the  sequential  process  that  simultaneously  determines  the  end- 
spectra  and  the  expansion  coefficients,  FN  .  An  analogous  procedure  is  used  to  find  end- images  and  the  expansion 
coefficients,  CL  .  The  first  step  to  finding  the  end-spectra  is  the  same  as  the  step  described  in  Equation  (7).  After  a 
spectrum  has  been  selected,  it  is  removed  from  all  other  vectors  of  H by  projection.  In  the  projection  step,  the  orthogonal 
projection  coefficient,  Onj- ,  is  found.  It  is  modified  by  scaling,  as  necessary,  changing  the  projection  to  an  oblique 

projection  to  satisfy  the  constraints  using  the  procedure  outlined  below.  The  oblique  projection  is  illustrated  in  Figure  1. 

If  the  orthogonal  projection  coefficient  is  not  positive,  this  extreme  vector  cannot  contribute  to  a  convex  model  of  this 
data  vector,  and  the  expansion  coefficient  is  set  it  to  zero. 


If  On  j  <  0 ,  set  anj-  =  0  and  F„j  =^>  0  . 


If  the  orthogonal  projection  coefficient  is  positive,  testing  for  possible  modification  to  an  oblique  projection  must  occur 
during  the  update  step  of  the  expansion  coefficients  of  previously  selected  extreme  vectors.  If  none  of  the  updates  would 
lead  to  a  negative  coefficient,  no  constraints  are  active  and  the  orthogonal  projection  coefficient  is  used.  If  one  or  more 
of  the  updated  coefficients  would  become  negative,  a  constraint  is  active.  An  oblique  projection  is  used  to  remove  the 
most  offending  previous  extreme  vector  from  the  model.  This  processing  proceeds  as  follows: 

Find  the  smallest  value,  vmin  ,  of  the  set 

J7n~  1 

i 

{vk  -  — -—^L - }  for  all  previous  extreme  vectors,  k ,  in  the  model  of  the  new  end-spectra,  column,  q(n). 

Fk,q(n)On,j 

If  vmin  >  1  no  constraints  are  active  and  the  orthogonal  projection  coefficient  is  valid,  set  anj  =  1 ,  and  F*j  =^>  Onj 


If  vmin  <  1 ,  then  a  constraint  is  active  and  the  end-spectrum,  sk ,  with  vk  -  vmin  will  be  removed  from  the  model  of 
pixel.  Set  an  j  -  vmin  .  The  oblique  projection  coefficient  is  set  to  •  =  ocnjOnj 

With  this  value  of  F”j ,  the  previous  expansion  coefficients  are  updated  using  Equation  (11). 
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Figure  1.  Illustration  of  orthogonal  and  oblique  projections.  The  vertical  dotted  line  is  the  residual  vector  for  the  jth  data  vector  after 
orthogonal  projection  onto  the  nth  basis  vector.  The  oblique  dash  line  is  the  residual  vector  after  an  oblique  projection  onto  the  nth 
basis  vector.  The  length  of  the  residual  vector  is  its  residual  norm. 

Geometrically,  when  a  constraint  is  active,  the  projection  is  oblique,  as  illustrated  in  Figure  1.  The  lengths  of  the  residual 
vectors  for  the  orthogonal  and  oblique  projections  illustrate  that  constraints  lead  to  larger  residual  vectors.  The  convex 
projection  will  be  as  close  as  possible  to  the  orthogonal  projection  while  maintaining  the  positive  constraints.  It  differs 
from  the  orthogonal  projection  only  when  a  constraint  is  active.  An  active  constraint  leads  to  one  of  the  previous 
extreme  vectors  being  removed  from  the  expansion.  If  no  constraints  are  active,  the  orthogonal  projection  is  used  and 
the  new  extreme  vector  is  added  to  the  description  of  the  data  vector  with  no  prior  one  being  removed. 


The  convex  factorization  results  differ  from  those  of  the  unconstrained  factorization  in  several  ways.  The  vectors,  w(n) , 
obtained  in  the  convex  factorization  do  not  form  an  orthogonal  set.  The  expansions  coefficients  are  not  the  optimal 
constrained  least  squares  expansion  coefficients.  Rather,  they  are  the  stepwise-constrained  least-squares  coefficients  for 
the  fixed  common  order  of  selection  of  the  basis,  which  is  the  same  for  all  pixels.  The  set  of  vectors  are  a  set  of 

extreme  vectors  of  the  data  set.  Several  of  the  columns  selected  may  differ  from  those  obtained  for  the  orthogonal  case 
where  the  basis  is  selected  based  on  linear  independence  criteria.  The  extreme  vectors  need  not  be  linearly  independent. 
In  general,  there  are  more  extreme  points  in  a  data  set  than  the  rank.  Our  algorithm  is  often  stable  when  the  rank  is 
exceeded  since  the  expansion  coefficients  are  computed  simultaneously  and  the  pixel  spectra  are  computed  using  a 
subset  of  the  basis.  Numerical  problems  occur  only  if  the  dimension  of  the  subset  selected  to  model  an  individual  pixel 
spectrum  exceeds  the  rank. 


3.  APPLICATIONS 

SMACC  was  applied  to  extract  end-spectra  and  end- images  of  a  200  by  200  subset  of  an  AVIRIS  scene  of  Stennis  AFB, 
Mississippi.  The  data  set  contains  224  channels  from  0.4  pm  to  2.4  pm.  The  data  was  atmospherically  corrected  by 
FLA  ASH13.  An  image  of  the  data  set  is  illustrated  in  Figure  2. 


Figure  2.  An  image  from  the  AVIRIS  data  set  for  the  Stennis  site.  The  numbers  on  the  image  indicate  the  approximate  location  of  the 
pixels  selected  as  the  first  twenty- five  end-spectra. 

3.1  Determination  of  end-spectra 

Fifty  end-spectra  were  sought  together  with  their  abundances.  The  approximate  locations  of  the  first  25  endmembers 
selected  are  indicated  by  the  numerals  on  the  image  in  Figure  2.  The  abundances  were  constrained  to  be  positive  but  no 
restriction  was  placed  on  their  sum.  The  first  end-spectrum  was  chosen  as  the  brightest  spectrum.  It  is  a  localized  object 
at  pixel  location  near  the  large  structure  at  the  bottom  of  the  image.  End-spectra  3,  4,  8,  13  and  47  are  located  in 
vegetated  regions.  Their  spectra  are  illustrated  in  Figure  3.  Their  contributions  to  the  scene  pixels  are  illustrated  through 
their  abundance  maps,  Figure  3.  While  some  of  the  vegetation  spectra  have  similar  shapes,  their  spatial  distributions 
vary.  End-spectra  5,  9,  11,  12,  17,  30  and  39  represent  roads,  paths,  dirt  and  soil.  The  spectra  and  abundance  maps  are 
illustrated  in  Figures  5  and  6,  respectively.  The  remaining  end-spectra  are  of  isolated  objects  in  the  scene.  Some 
example  of  end-spectra  and  their  abundances  are  illustrated  in  Figures  7  and  8,  respectively.  These  spectra  represent 
localized  objects.  Most  pixels  are  described  by  a  small  subset  of  the  end-spectra.  The  most  frequent  situations  are  pixel 


models  of  two,  three  or  four  end-spectra.  75%  of  the  pixels  are  described  by  four  or  fewer  end- spectra.  Only  1%  are 
modeled  by  combinations  of  more  than  10  end-spectra.  The  abundances  show  that  the  end-spectra  3  and  4  are  the  most 
ubiquitous  contributors  to  the  vegetation  in  the  scene.  However,  they  contribute  to  separate  pixel  models.  Many 
vegetation  pixel  spectra  are  described  as  blends  of  the  end-spectra  4  and  8  or  4,  8  and  47.  Other  vegetation  pixel  spectra 
are  describes  as  blends  of  the  end-spectra  3  and  13.  The  thin  covering  of  vegetation  in  the  lower  left  portion  of  the  image 
is  modeled  by  end-spectra  3  and  13  with  contributions  of  dirt  and  soil  from  end-spectra  9  and  30.  End-spectrum  5  is  the 
most  ubiquitous  non-vegetation  contributor  to  the  scene.  Several  pixel  spectra  are  modeled  by  combinations  of  the  end- 
spectra  5,  9  and  17.  Other  pixel  spectra  are  modeled  by  combinations  of  end-spectra  5  and  11  and  5,  11  and  12.  The 
pixel  spectra  near  edges  of  roadways  are  modeled  by  combinations  of  the  fifth  end-spectrum  5  with  either  end-spectrum 
11  or  12  with  roadside  vegetation  contributions  from  end-spectrum  3.  Also,  end-spectrum  5  shares  in  contributing  to 
those  pixel  spectra  where  end-spectrum  39  has  abundance  on  the  bottom  and  on  the  left  of  the  scene.  See  Figures  6a  and 
6h.  End-spectra  1,  2,  6,  7,  10,  11,  20,  23,  37  and  50  contribute  to  models  of  moderate  to  small  localized  features  in  the 
scene.  The  magnitude  of  the  reflectance  of  end-spectra  1,  2,  6,  7  and  37,  illustrated  in  Figure  7a,  indicate  that  they  model 
moderately  to  highly  reflective  materials.  End-spectra  10,  20,  24  and  50,  illustrated  in  Figure  7b,  have  low  reflectance, 
and  the  pixels  they  model  are  rather  dim.  Nevertheless,  the  low  reflectance  end-spectra  contain  unique  spectral  features 
and  contribute  strongly  to  specific  localized  features  in  the  scene.  Their  abundance  maps  are  illustrated  in  Figure  8. 
Several  of  these  features  are  in  the  lower  right  comer  of  the  image.  The  remaining  end-spectra  contribute  to  models  of 
small  local  features  and  scattered  isolated  pixels. 


Figure  3.  The  vegetation  spectra  selected  as  end-spectra.  End-spectra  3,  4,  8,  13  and  47  are  shown. 


Figure  4.  Abundance  maps  for  vegetation  end-spectra,  (a)  3,  (b)  4,  (c)  8,  (d)  13  and  (e)  47. 
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Figure  5.  The  end-spectra  selected  to  represent  roads,  paths,  dirt  and  soil.  End-spectra  5,  9,  11,  12,  17,  24,  30  and  39  are  shown. 


Figure  6.  Abundance  maps  for  end-spectra  selected  to  represent  roads,  paths,  dirt  and  soil,  (a)  5,  (b)  9,  (c)  11,  (d)  12,  (e)  17,  (f)  24,  (g) 
30  and  (h)  39  are  shown. 
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Figure  7.  Selected  end-spectra  that  represent  localized,  intermediate- sized  objects  in  the  scene.  The  end-spectra  1,  2,  6,  7  and  37 
illustrated  in  Figure  7a  are  from  high-reflectance  objects.  The  end-spectra  10,  20,  23  and  50  in  Figure  7b  are  from  low-reflectance 
objects. 
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Figure  8.  Abundance  maps  for  selected  end-spectra  that  represent  localized  intermediate  sized  objects  in  the  scene,  (a)  1,  (b)  2,  (c)  6, 
(d)  7,  (e)  10,  (f)  20,  (g)  23,  (h)  37  and  (i)  50. 


The  residual  norms,  induced  by  truncating  the  expansion  at  50  endmembers,  are  small.  The  pixel  spectrum  with  the 
largest  residual  norm  and  the  model  spectrum  that  approximates  it  are  illustrated  in  Figure  9.  This  pixel  spectrum  would 
be  the  one  selected  as  the  fifty-first  end-spectrum. 
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Figure  9.  The  pixel  spectrum  that  has  the  largest  residual  norm,  after  the  first  fifty  end-spectra  were  selected,  and  the  model  spectrum. 
This  pixel  spectrum  would  have  been  selected  as  the  fifty-first  end-spectrum  if  the  process  were  continued  beyond  fifty. 

3.1.1  Fractional  abundances 

A  convex  cone  model  does  not  place  a  constraint  on  the  sum  of  abundances.  A  connection  between  the  convex  cone 
model  and  a  convex  hull  model  can  be  obtained  by  assuming  the  null  vector  as  an  extra  end-spectrum.  The  null  vector 
can  be  considered  as  representing  totally  shadowed  or  dark  material.  Then  all  of  the  convex  cone  models  whose 
abundances  sum  to  less  than  one  belong  to  the  hull  of  the  simplex  formed  from  the  extreme  vectors  of  the  data  and  the 
null  vector.  In  the  above  example  calculation  of  end-spectra  of  the  Stennis  scene,  96%  of  the  pixels  are  modeled  by  the 
simplex,  with  the  remaining  4  %  having  abundance  sums  that  exceed  unity.  Most  of  the  4%  have  sums  that  are  close  to 
one,  but  a  few  are  larger  with  the  largest  value  equal  to  1.2.  Variations  in  brightness  arise  from  illumination  effects, 
weathering,  moisture  content,  and  the  use  of  a  null  vector  is  a  simplifying  way  to  account  for  the  variability  that  leads  to 
changes  in  brightness  scales  more  or  less  uniformly  over  all  channels. 

An  alternative  is  to  modify  abundances  to  reflect  fractional  contributions  of  the  end-spectra  spectral  intensity  to  the  pixel 
spectral  intensity.  A  fractional  abundance  is  an  estimate  of  the  fraction  of  integrated  area  under  the  radiance  curve  of  a 
pixel  that  is  contributed  by  the  end- spectrum.  To  compute  fractional  abundance,  each  pixel  spectrum  (including  those  of 
end-spectra)  is  normalized  by  the  area  under  its  curve.  Then  the  expansion  coefficients  in  the  model  of  a  pixel  are  the 
fractional  contributions  to  the  integrated  radiance  of  the  corresponding  end-spectra.  Summing  the  channel  radiances  in 
the  non-extreme  pixel  and  the  end-member  spectra  leads  to 

N 

J  f7  ~  J 

1  nr n  ~  1  pixel  ’ 

where  IPixei  is  the  sum  of  the  channel  radiances  in  the  pixel  and  In  is  the  sum  of  the  channel  radiances  in  the  nth  end- 
spectrum.  The  equality  is  in  the  least-squares  sense  assuming  that  the  channel  residuals  sum  to  zero.  The  fractional 
contribution  of  radiance  from  end-member  n  to  the  pixel  radiance  is  the  quantity  (InFn  /  I pixei).  This  relation  holds 

regardless  of  whether  the  sum  to  one  constraint  is  applied  and  active.  To  obtain  an  estimate  of  the  pixel  fill  or  material 
abundance  in  the  pixel,  it  is  necessary  to  address  the  variability  in  material  spectra  that  exists  in  the  scene. 


3.2  End-image  Expansions 

The  hyperspectral  image  was  also  expanded  in  channels  rather  than  pixels,  utilizing  Equation  (2).  The  output  is  a  set  of 
image  frames  of  the  most  unique  channels  and  the  abundances  or  contributions  that  these  make  to  non-extreme  images. 
Fifteen  end- images  and  their  abundances  for  the  Stennis  scene  were  obtained.  The  abundance  calculations  were 
constrained  to  model  the  non-extreme  images  within  the  convex  cone  of  the  end-images.  The  abundance  contributions  of 
the  end-images  are  illustrated  in  Figure  10.  The  peaks  in  these  abundance  curves  occur  at  the  channels  of  the  selected 
end-images.  The  SMACC  algorithm  tends  to  select  the  brightest  image  among  a  group  of  nearly  identical  images.  The 
abundance  curves  provide  detailed  information  on  the  unique  spectral  regions  and  the  high  redundancy  in  the  spectral 
data. 


Figure  10.  The  abundance  spectra  of  the  first  fifteen  end-images  selected  by  SMACC. 


There  are  spectral  bands  where  a  single  end-image  dominates.  In  these  bands,  the  non-extreme  channel  images  are,  to  a 
fairly  high  degree  of  accuracy,  scaled  copies  of  the  end-image.  The  end-image  at  389  nm  dominates  in  contributions  to 
channel  images  from  379  nm  to  438  nm.  End-image  673  nm  dominates  channel  images  from  556  nm  to  702  nm.  End- 
image  788  nm  dominates  channel  images  from  740  nm  to  788  nm.  End-image  798  nm  contributes  to  a  very  narrow  band 
around  798  nm.  End-image  846  nm  dominates  from  836  nm  to  884  nm,  end-image  1076  nm  from  1038  nm  to  1115  nm. 
The  end-image  at  1284  nm  dominates  from  1 173  nm  to  1334  nm.  The  abundance  spectrum  of  the  end-image  at  1425  nm 
abundance  spectra  has  two  spikes,  one  at  1425  nm  and  another  at  1803  nm.  The  end-image  at  1664  nm  dominates  from 
1604  nm  to  1753  nm.  The  abundance  spectrum  at  1948  nm  has  a  single  narrow  band  around  1948  nm.  The  end-image  at 
2079  nm  dominates  in  the  representation  of  the  channel  images  from  1989  nm  to  2299  nm.  The  abundance  spectra  of  the 
end-images  at  2478  nm,  2488  nm,  2498  nm  and  2508  nm  have  single  spikes  at  the  end-image  wavelengths.  The  end- 
images  with  narrow  peaks  in  spectral  abundance  are  near  atmospheric  absorption  features.  The  reflectance  spectra  at 
these  channels  still  have  residual  atmospheric  spectral  features  that  make  each  channel  unique.  The  abundance  spectra  of 
these  end-images  only  make  small  contributions  to  other  channels.  In  spectral  regions  that  are  not  dominated  by  a  single 
end-image,  most  of  the  non-extreme  channel  images  are  described  by  a  convex  model  of  two  to  three  end-images. 

False-color  RGB  images  constructed  from  trios  of  end-images  or  images  selected  from  different  bands  where  one  end- 
image  dominates  provide  enhanced  visualization  of  the  data.  For  example,  selecting  the  first  three  end-images  2478nm, 
1285nm  and  2079  nm  as  “red-green-blue”,  respectively,  results  in  the  false-color  image  illustrated  in  Figure  11a. 
Selecting  end-images  that  dominate  the  contributions  to  a  range  of  wavelengths  about  their  values,  for  example  end- 
images  at  1077  nm,  846  nm  and  673  nm  as  “red-green-blue”,  respectively,  is  illustrated  in  Figure  lib.  Selecting  end- 
images  whose  abundances  are  narrow  spikes  about  the  central  value,  for  example  from  1948  nm,  788  nm  and  1425  nm  as 
“red-green-blue”,  respectively,  is  shown  in  Figure  11c.  Alternatively,  since  an  arbitrary  wavelength  can  be  chosen  for  the 
first  channel  image,  a  false-color  image  can  be  constructed  with  the  selected  first  image  and  the  next  two  selected  by  the 
SMACC  algorithm.  For  example,  a  blue  channel  can  be  selected,  428  nm,  then  SMACC  selects  channel  images  at  1077 
and  2079  as  the  next  two  end-images.  The  “red-green-blue”  image  as  2079  nm,  1077  nm  and  428  nm,  respectively,  is 


illustrated  in  Figure  lid.  A  red-green-blue  image  created  form  visible  wavelengths  635  nm,  536  nm  and  428  nm, 
respectively,  is  included  for  comparison  in  Figure  lie.  The  false-color  images  produced  using  the  end-images  accent 
differences  and  small  features  when  compared  to  channels  more  closely  associated  with  the  visible.  Note  that  the  images 
in  Figure  11a,  lib,  lie  and  lid  enhance  the  distinction  between  vegetation  types  over  the  more  natural  visible  red- 
green-blue  color  scheme.  The  channels  selected  are  specific  to  a  scene  or  a  scenario  and  many  scenes  need  to  be 
analyzed  before  the  band  selection  process  can  be  utilized  for  sensor  design. 
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Figure  11.  Five  false  color  images  selected  from  the  data,  (a)  Red,  green,  blue  as  2478  nm,  1285  nm  and  2079  nm,  respectively.  The 
first  three  end-images  selected  by  SMACC.  (b)  Red,  green,  blue  as  1077  nm,  846  nm  and  673  nm  respectively,  end-images  which 
model  a  broad  range  of  wavelengths,  (c)  Red,  green,  blue  as  1948  nm,  788  nm  and  1425  nm,  respectively,  end-images  with  narrow 
spikes  in  spectral  abundance,  (d)  Red,  green,  blue  as  2079  nm,  1077  nm  and  428  nm,  respectively,  the  best  images  to  combine  with  the 
blue  image,  (e)  Red,  green,  blue  as  channels  in  the  visible  bands  635  nm,  536  nm  and  428  nm,  respectively. 

4.  CONCLUSIONS 

Matrix  factorization  provides  a  powerful  tool  for  analysis  of  the  highly  redundant,  pixel  spectra  and  channel  images  of 
hyperspectral  data  sets.  Our  convex  approach  with  constraints  requiring  positive  abundances  and  constraints  on  the 
maximum  number  of  endmembers  for  a  pixel  model  provides  a  detailed  physical  description  of  the  spatial  and  spectral 
features  of  hyperspectral  imagery.  The  approach  can  extract  spectra  that  account  for  environmental  and  illumination 
variations  in  the  spectral  data  and  model  the  variations  in  the  non-extreme  spectra.  The  convex  factorization  approach 
finds  small  subsets  of  end-spectra  to  model  the  material  types  and  variations  autonomously.  End-spectra  that  describe 
localized  features  are  candidate  anomalous  spectra  for  processing  with  detection  algorithms.  Convex  factorization 
applied  to  channel  images  determines  the  spectral  bands  in  the  data  where  images  are  highly  correlated.  Sets  of  images 
within  these  bands  are  nearly  scaled  copies  of  each  other.  Images  within  these  bands  could  be  co-added  to  increase  the 
signal-to-noise  ratio.  Images  from  separate  bands  can  be  selected  to  enhance  visualization  of  spatial  spectral  boundaries 
in  the  data. 
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